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(54) Video image compression using weighted wavelet hierarchical vector quantoation 

(57) A weighted wavelet hierarchical vector quanti- 
zation ( WWHVQ) procedure is initialed by obtaining (12) 
an N x N pixel image where 8 bits per pixel. A look-up 
operation .(14) is performed to obtain data representing 
a discrete wavelet transform (DWT) lollowed by a quan- 
tization of the data. Upon completion of the look-up, a 
data compression will have been performed. Further 
stages and look-up will result in further compression of 
the data, i.e.,4:i, B:1. 16:1. 32:1. 64*1, ..etc. Accordingly, 
a determination (16) is made whether the compression 
is complete. If the compression is incomplete, further 
look-up is performed. If the compression is complete, 
however, the compressed data is transmitted (1B). Op- 
tionally, it is determined (1 9) al a galeway whether further 
compression is required. II so. transcoding is performed 
(20). The receiver receives (22) the compressed data. 
Subsequently, a second look-up operation (24) is per- 
termed to obtain data representing an inverse discrete 
wavelet iransform of the decompressed data. After one 
iieralion.the data is decompressed by a factor of two. 
Further it e rations allows f or lurther decompression of the 
data. Accordingly, a determination (26) is made whether 
decompression is complete. II the decompression is in 
incomplete, lurther look-ups are performed, elsethe pro- 
cedure is ended. 
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This invention relates 10 £ method and apparatus lor compressing e video imaoe tor transmission lo a receiver 
and/or decompressing the imaoe at the receiver. More particularly, the invention is directed 10 an apparatus and method 
for performing data compression on a video imaoe using weighted wavelel hierarchical vector quantization fWWHVQ) 
WWHVG advantageously utilises cenain aspects ot hierarchical vector quantization (HVQ) and discrete wsvelet trans- 
form (DWT), a subband transform. 

A vector quantizer (VO) is a quantizer Ihai maps k-dimensional inpui veclors into one of a finile set of k-dimensional 
reproduction veclors, or codewords. An anelog-lo-digilal converter, or scalar quantizer, is a special case in which the 
10 quantizer maps each real number 10 one ot a finite set of output levels. Since the togarithm (base 2) ot the number of 
codewords is the number of bits neeaed to specify me codeword, the logarithm ol the number ot codewords, divided by 
the vector dimension, is the rate ot the quantizer in bits per symbol. 

A VO can be divided into two pahs: an encoder and a decoder. The encoder maps the input vector into a binary 
code representing the index o! the selected reproduction vector, and the decoder maps the binary code into the selected 
15 reproduction vector. 

A major advantage ol ordinary VO over other types ol quantizers (e.g.. transform coders) is that the decoding can 
be done by a simple table lookup. A major disadvantage ol ordinary VO with respect toother types of quantizers is that 
the encoding is computationally very complex. An optimal encoder performs a lull search through the entire set ol re- 
proouct.cn veclors looking lor ine reproduction vector that is closest (wilh respect lo a given distortion measure) to each 
*° inpul veclor. 

For example, il the distortion measure is squared error, then the encoder computes the quantity (|x - yj| 2 lor each 
tnput vector X and reproduction veclor y. This results in essentially M murtiply/add operations per input symbol where 
M is the number of codewords. A number ot suboplimal. but computationally simpler, vector quantizer encoders have 
been studied in the literature. For a su(vey, see the book by Gersho and Gray. Vector Quantization and Signal Com- 
& pression, Kluwer, 1992. 

Hierarchical veclor quantization (HVO) is VO thai can encode using essentially one table lookup per input symbol 
(Decoding is also done by table lookup). To the knowledge ol the inventors. HVQ has heretotore not appeared in the 
Irterature outside ol Chapter 3 ol the Ph.D. thesis of P. Chang. Predictive. Hierarchical, and Transform Vector Quan- 
lization tor Speech Coding. Stanford University. May 19S5. where it was used lor speech. Other methods named 'hi- 
erarchical veclor Quantization- have appeared in the literature, bul they are unrelated lo the HVQ that is considered 
respecting Ihe present invention. 

The basic idea behind HVO is the loltowing. The input symbols are finely quantized top bits of precision. For image 
data, p = 8 is typical. In principle il is possible to encode a k-dimensional veclor using a single lookup into a table with 
a kp-bit address, but such a table would have 2*p entries, which is clearly inteasible if k and p are even moderately large. 
HVO performs the table lookups hierarchically. For example, to encode a k = S dimensional vector (whose components 
are each finely quantized to p = 6 bits ot precision) to 6 bits representing one ol M = 256 possible reproductions the 
hierarchical structure shown in FIGURE la can be used in which Tables l . 2. and 3 each have 16-bit inputs and £-bit 
outputs (i.e., they are each 64 KByte tables). 

A signal tlow diagram for such an encoder is shown in FIGURE 1b. In the HVO ol FIGURE lb, the tables T at each 
stage o1 the encoder along with the delays 2 are illustrated. Each level in the hierarchy doubles the veclor dimension 
ol Ihe quantizer, and therefore reduces the bit rate by a lactor ol 2. By similar reasoning, the ilh level in the hierarchy 
performs one lookup per 2' samples, and therelore the total number ol lookups per sample is al mosl 1/2 4 1/4 + 1/8 
+ ... = 1 . regardless of the number ol levels. 01 course, il is possible lo vary these calculations by adjusting Ihe dimensions 
of the various tables. 

The contents ot the HVO tables can be determined in a variety of ways. A straightlorward way is the tollowtng With 
reference to FIGURE 1a, Table i is simply a table-lookup version of an optimal 2-dimensional VO. That is. an optimal 
2-dimensional lull search VO with m = 256 codewords is designed by standard means (e.g., the generalized Lloyd 
algorithm discussed by Gersho and Gray), and Table l is filled soihat rl assigns lo each ol its 2 1 * possible 2-dimensional 
input veclors the 8-brt index of the nearest codeword. 

Table 2 is jusl slightly more complicated. First, an optimal 4-dimensionaI lull search VO with M - 256 codewords is 
designed by standard means. Then Table 2 is lilted so that it assigns lo each of its 2 16 possible 4-dimensional input 
veclors (i.e.. the cross product ol all possible 2-dimensional outpul vectors Irom the first slage) the B-bit index of its 
nearest codeword. The tables lor stages 3 and up are designed similarly. Note that the dislonion measure is completely 
arbitrary. 

A discrete wavelet transformation (DWT}. or more generally, a Iree-slructured subband decomposilion. is a method 
lor hierarchical signal translormalion. Little or no inlormation is losl in such a transformation. Each slage ol a DWT 
involves filtering a signal into a low-pass component and a high-pass component, each ol which is critically sampled 
(i.e.. down sampled by a factor ol two). A more general tree-structured subband decomposilion may filter a signal into 
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more than two bands per stage, and may or may not be critically sampled. Here we consider only ihe DWT. but inose 
skilled in the art can easily extena the relevant notions io the more general case. 

With relerenceto FIGURE 2a. let X= (x(0), x(H- ,x{N • 1)) be a 1 -dimensional input signal with finite length N. As 
shown by the tree structure A, the first stage of a DWT decomposes the input signal = X into the low-pass and 

high-pass signals (X L1 (0), x u (1)....x L1 (N/2.i)} and Xh, = (x H1 (0), X^p) Xh^N/M}). each of length N/2. The 

second stage decomposes only the low-pass signal X L1 from the first stage into the low-pass end high-pass signals X^ 

= (x^CO), x^O) x^/NM-1)) and X^ = (x^O). x h2 (1) x^N^-i)), each of length N/4. Similarly, the third stage 

□ecomposes onfy the low-pass signal X^ Irom the second stage into low-pass and high-pass signals X^ and Xh, cl 
lengths N/6, and so on. It is also possible lor successive stages lo Decompose some ol the high-pass signals in add-on 
to the low-pass signals. The set of signals at the leaves ot the resulting complete or pariiallree is precisely the transform 
of the input signal at the root. Thus a DWT can be regarded as a hierarchically nested set of transforms. 

To specify Ihe iransform precisely, h is necessary to specify Ihe filters used at each stage. We consider only finite 
impulse response (FIR) filters, i.e.. wavelets with finite support. L is the length of the filters (i.e.. number of taps), and 
Ihe low-pass filler {ihe scaling function) and the high-pass fitter (the difference function, or wavelei) are designated by 
their impulse responses. 1(0). 1(1). ...l(L • 1). and h(0). h(1)....h(L-1). respectively. Then at the output of the mth slage. 

\»® = l (°) x L. m . 1 <20 * 'd^u-if 2 ' + 4 - + | (L-l)x Lm . 1 (2i + L-1) 
x H . m (i) = h(0)x hm .,(2i) ♦ h(i)x Mm . 1 (2i ♦!) + ...♦ hd-Ux^tfi + L-1) 

lor i = 0, 1 N/2 m . Boundary effects are handled in some expedienl way, such as selling signals lo zero oulside their 

windows of definition. The fitters may be the same Irom node to node, or they may be diflerent. 

The inverse transform is performed by different lowpass and high-pass filters, called reconstruction fillers, applied 
in reverse order. Lei l'(0). I'(l)... .l'(L-l) and h'(0), h'(l)....h'(L.i) be the impulse responses of the inverse lifters. Then 
X L m0 can be reconstructed from m and X H m as: 

W 1 (2i>= '"(0)x L m (i) + l*(2)x L J\ + i) + h'(0)x H ^,(i) + h'(2)x Hm (i + 1) 

W (2i + D = l*(1)x Lm <i + l) ♦ rojx^Ci 4 2) + h'(l)x Hjn (i + 1)4 h'(3)x Hn) (i + 2) 

lor '- 0 . 1 N/ 2 m - Tnat is . Ine 'ow-pass and high-pass bands are up sampled (interpolated) by a laclor of lwo ; liltered by 

Iheir respective recons I ruction filters, and added. 

Two-dimensional signals are handled similarly, but with two-dimensional filters. Indeed, if the filters are separable, 
then the littering can be accomplished by firsi filtering in one dimension (say horizontally along rows), then filtering in 
the other dimension (vertically along columns). This resulls in the hierarchical decompositions illustrated in FIGURES 
2B, showing tree structure B, and 2C. in which the odd stages operate on rows, while the even stages operate on 
columns. If the input signal \ 0 is an N x N image, then X L1 and X H1 are N x (N/2) images, X LL2 . X^, X^, and X HH2 
are (N/2) x (N/2) images, and so forth. 

Moreover, notwithstanding that which is known about HVO and DWT. a wide variety of video image compression 
methods and apparatuses have been implemented. One existing method that addresses transcoding problems is the 
algorithm of J. Shapiro, 'Embedded Image Coding using Zeroirees of Wavelet Coefficients.* IEEE. Transactions on 
Signal Processing, December 1993. in which iranscoding can be done simply by stripping ofl prefixes ol codes in ihe 
bit stream. However, this algorilhm trades simple iranscoding lor computationally complex encoding and decoding. 

Other known methods lack certain practical and convenient features. For example, these other known video com- 
pression methods do not allow a user to access the transmitted image at different quality levels or resolutions during 
an inieraclrve multicast over multiple raic channels in a simplified system wherein encoding and decoding are accom- 
plished solely by the performance of table lookups. 

More particularly, using these other non-embedded encoding video compression algorithms, when a multicast (or 
simulcast, as applied in the lelevision industry) of a video stream is accomplished over a network, either every receiver 
of the video sweam is restricted lo a certain quality (and hence bandwidth) level at the sender or bandwidth (and CPU 
cycles or compression hardware)is unnecessarily used by multicasting a number of streams at different bit raies. 

In video conferencing (multicast) over a heterogeneous network comprising, lor example, ATM, the internet. ISDN 
and wireless, some formol Iranscoding is typically accomplished al Ihe 'gateway" between senaer and receiver when 
a basic rale mismatch exists between Ihem. One soiulion lo the problem is lor Ihe 'galeway'/receiver lo decompress 
Ihe video stream and recompress and scale il according to internal capabilities. This soiulion. however, is not onry 
expensive but also increases laiency by a considerable amouni. The transcoding is preferably done in an online lashion 
(with minimal latency/butlering) due to the interactive naiure of the application and to reduce hardware/software costs. 

From a user's perspective, the problem is as follows: (a) Sender(i) wants to send a video siream at K bits/sec to M 
receivers; (b) Receiver(j) wants tc receive Sender(i)'s video stream at L bits/sec (L<K); but (c) the image dimensions 
that Recerver(j) desires or is capable of processing are smaller than ihe default dimensions that Sender(i) encoded. 
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li is desirable thai any system ano/or meihoa to address these problems in interactive video advantageousty incor- 
porate (l) inexpensive transcoding trom e higher to a lower bh rate, preferably by only operating on a compressed 
stream. (2) simple bit rate conirol. (3) simple scalability ol dimension at the destination, (4} symmetry (resulting in very 
inexpensive decode and encode), anc (5) a priontuec compressed stream in addition to acceptable rate-distortion per- 
lormance. None of the current stanciarcs (Motion JPEG, MPEG and H.261) possess all ol these characteristics. In 
particular, no current stenoard has a facility to transcode from e higher to a lower bh rate efficiently. In addition, ail are 
computationally expensive. 

The pieseni invention seeks to overcome the alorenoied and other problems and to incorporate the desirec char- 
acteristics noted above. It is particularly dueeied to the art ol video da la compression, and will thus be described with 
specific reference thereto. It is appreciated, however, that me invemion will have utility in other fields and applications. 

The present invention provides a method tor compressing and transmitting data, the method comprising steps of: 
receiving the data; successively performing multiple stages of firsi lookup operations to obtain compressed oata at each 
stage representing vector quantized discrete subband coefficients: and. transmitting the compressed data lo a receiver. 

The method may comprise successively performing i levels of a first table lookup operation to obtain, for example. 
2*:i compressed data representing a subband transform, e.g.. DWT. of the input data followed by vector quantization 
thereof. 

In accordance with another aspect of the presenl invention, the compressed data (which may also be transcoded 
at a gateway) is transmitted lo a receiver. 

In accordance with another aspeci ol the presenl invention, the compressed data is received at a receiver and 
multiple stages of a second lable lookup operation are performed lo selectively obtain decompressed dala representing 
at least a partial inverse subband Iranslorm ol the compressed dala. 

The invention further provides an apparatus tor carrying out the methods as set lorthe above or in accordance with 
any of the embodiments described herein. 

One advantage ol the present invention is thai encoding and decoding are accomplished solely by table lookups. 
This results in very efficient implementalions For example, this algorithm enables 30 frames/sec encoding (or decoding) 
of GIF (320x240) resolution video on Sparc 2 class machines with just 50% CPU loading. 

Another advantage of the present invention is thai, since only table lookups are utilized, the hardware implemented 
to perform the method is relatively simple. An address generator and a limited number of memory chips accomplish the 
method. The address generator could be a microsequencer, a gate array, an FPGA or a simple ASIC. 

The presenl invention exists in Ihe consiruciion. arrangement, and combination, of the various parts o1 the device, 
whereby the objects contemplated are attained as hereinafter more fully sel lorlh, specifically pointed out in the claims, 
and illustrated by way ol exemplary embooimenis in the accompanying drawings in which: 

FIGURE la illustrates a table siruciure ot prior art HVG: 

FIGURE lb is a signal flow diagram illusiraiing prior an HVQ for speech coding; 

FIGURES 2a-c are a graphical representation ol a prior art DWT; 

FIGURE 3 is a flowchart representing the prelerred method of the present invention; 

FIGURES 4a-b are graphical representations ol a single stage of an encoder pertorming a WWHVO in the melhod 
ol FIGURE 3; 

FIGURE 4c is a signal Mow diagram illustrating an encoder pertorming a WWHVO in the method ol FIGURE 3; 
FIGURE 5 is a signal flow diagram of a decoder used in the method ol FIGURE 3; 

FIGURE 6 is a block diagram of a system using 3-D subband coding in connection with the method of FIGURE 3; 
FIGURE 7 is a block diagram of a system using Irame differencing in connection with the melhod of FIGURE 3; 
FIGURE 6 is e schematic representation ot the hardware implementation of the encoder of the method ol FIGURE 3: 

FIGURE 9 is a schematic representation ol another hardware implementation of the encoder of the method FIGURE 

3; 



FIGURE 10 is a schematic rcpicscnianon ol the hardware implementation of the decoder of the method ol FIGURE 
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3; and 

FIGURE n is a schematic representation of another hardware implementation ol the decoder of the method of 
FIGURE 3. 

Referring now to the drawings wherein the showings are lor the purposes ol illustrating the preferred embodiments 
of the invention only and not lor purposes ol limiting same, FIGURE 3 provides a flowchart of the overall preferred 
embodiment, it is recognized tnat Ihe method is suitably implemented in me structure disclosed in the lollowing prelerred 
embodiment and operates in conjunction with software based control procedures. However, it is contemptaied that the 
control procedure be embodied in other suitable mediums. 

As shown, the WWHVG procedure is initialed by obtaining, or receiving in the system implementing the method, 
input data representing an NxN pixel image where 6 bits represent a pixel (Sieps 10 and 12). A look-up operation is 
performed to obtain data representing a subband transform foliowed by a vecior quantization of the data (slep 14). In 
ihe preferred embodiment, a discrete wavelet transform comprises the subband transform. However, it is recognized 
that other subband transforms will suffice. Upon completion of the look-up, a data compression has been performed. 
Preferably, such compression is 2:1. Further stages will result in further compression of the data, e.g., 4:1. 6:1, 16:1. 
32:1, 64:1,... etc. It is appreciated that other compression ratios are possible and maybe desired in certain applications. 
Successive compression stages, or iterations ol slep 14. compose the hierarchy of Ihe WWHVQ. Accordingly, a deter- 
mination is made whether Ihe compression is complete (step 16). If ■the compression is incomplete, further look-up is 
performed. II the desired compression is achieved : however, the compressed data is transmitted using any known trans- 
mitter (step 16). It is then determined at, lor example, a network gateway, whether further compression is required (step 
1 9). If so. transcoding is performed in an identical manner as the encoding (step 20). In any event, the receiver eventually 
receives the compressed data using known receiving techniques and hardware (step 22). Subsequently, a second 
look-up operalion is performed lo obtain data representing an inverse subband transform, preferabfy an inverse DWT. 
of decompressed data (step 24). After one complete stage, the data is decompressed. Further stages allow for further 
decompression ol the data to a desired level. A determination is then made whether decompression is complete (step 
26). If the decompression is incomplete, further look-ups are performed. If, however, the decompression is compleie. 
the WWHVQ procedure is ended (step 28). 

The embodiment of FIGURE 3 uses a hierarchical coding approach, advantageously incorporating particular fea- 
tures ol hierarchical vector quantization (HVQ)and Ihe discrete wavelet translorm(DWT). HVQ is extremely last, though 
its performance directly on images is mediocre. On tne other hand, the DWT is computationally demanding, though it 
is known to greatly reduce blocking artilacls in coded images. The DWT coefficients can also be weighted to match the 
human visual sensitivity in different frequency bands. This results in even better performance, since giving higher weights 

10 the more visually important bands ensures thai they will be quantized to a higher precision. The present invention 
combines HVQ and the DWT in a novel manner, to obtain ihe bcsi qua lilies of each in a single system, Weighted Wavelet 
HVQ (WWHVQ). 

The basic idea behind WWHVQ is to pertorm the DWT filtering using table lookups. Assume the input symbols have 
already been finely quantized to p bits of precision. For monochrome image data, p = 6 is typical. (For color image data, 
each color plane can be treated separately, or they can be vector quantized together into p bits), in principle it is possible 
tc filter the data with an L-tap tiller with one lookup per output symbol using a table with a Lp bit address space. Indeed, 
in principle ii is possible to perform both the low-pass tillering and the high-pass lillering simultaneously, by storing both 
lowpass and high-pass results in the table. Ol course, such a table is clearly inleasible it L and p are even moderately 
large. We lake an approach similar lo that ol HVQ: lor a lilier ol length L, log? L 'substages* ol table lookup can be used. 

11 each table has2p input bits and p output bus. The p output bits ol the linalsubsiage represents a 2-dimensional vector, 
one symbottrom the low -pass band and a corresponding symbol Irom the high- pass band. Thus, the wavelet coefficients 
outpul from the table are vector quantized. In this sense, ihe DWT is tightly integrated whh the HVQ. The wavelet 
coefficients at each slage of the DWT are vector quantized as are the intermediate results (after each substage) in the 
computation of the wavelet coefficients by table lookup. 

FIGURE 4a shows one stage i ol ihe WWHVQ. organized as log 2 L substages of table lookup. Here, L = 4, so that 
the number of substages is two. Note that the filters slide over by two inputs to compute each output. This corresponds 
to dewnsampiing (decimation) by a factor of 2. and hence a reduction in bit rate by factor ol 2. The second stage of the 
WWHVQ operates on coded oulputs from the first stage, again using log?!, substages of table lookup (but with different 
tables lhan in the first stage) and soon tor the following stages. It is recognized that each ol a desired number of stages 
operates in substantially Ihe same way so that the p bils at the output ol slage i represent a 2': 1 compression. The p 
bits al the oulpul ol the linal stage can be transmiiied directly (or indirectly via a variable-length code). The transmitted 
dala can be further compressed, or "IranscodedV lor example, al a gateway belween high and low capacity networks, 
simply by further stages ol tabic lookup, each of which reduces the bii rate by a tactor of 2. Hence both encoding and 
transcoding are extremely simple. 
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The tables in ihe first siaoe (i = l ) may be cesigned as toiiows. with reierence to FIGURE *a. In this discussion, we 
shall assume thai the input signal X = |x(0).x(l) 1 ...,x(N-l)} is one-dimensional. Those skilled in tne an will have little 
difficulty generalizing the discussion to image daia. First decompose The inpul signal X L0 = X into low-pass and high- pass 
signals and X^, each of length N/2. This produces e sequence ot 2 -dim ens tonal veclors 

* (0 = I* L1 (>). x H1 (i).lor i = 0. 1 N/2-1 . 

wnere 

x L1 (i) = !(0)x(2i) + l(1)x(2i 4 1) 4 K2)x(2i + 2) 4 l(3)x(2i + 3} 

and 

x H1 = h(0)x(2i> + h(1)x(2i n)* h{2Jx(2i 4 2} h(3)x(2i + 3) 

Such 2-cimensiona I vectors sre used to train an S-bitveclor quantizer O, 2 (or Table 1.2. Likewise, 2-dimensional vectors 

|l(0)x(2i) h l{1)x(2i 4 1). h{0)x(2i) + h(I)x(2i 4 1)], 
are used to train an 6-bit vector quantizer Q 11B tor Table i.ia, and 2-dimensional vectors 

[l(2)x(2i + 2) * l(3)x(2i «■ 3), h(2)x(2i + 2) + h(3)x(2i 4 3)], 
are used to train an 5-bit vector quantizer O, 1t lor Table 1.1b. All three quantizers are trained to minimize the expected 
weighted squared error distortion measure a(x.y) r |w L1 (x 0 - Vq)] 2 4 |w H1 tx, - y,)) 2 , where the constants w L1 and w H1 
are proponional to the human perceptual sensitivity {i.e.. inversely proportional to the just noticeable contrast) in the 
low-pass and high-pass bands, respectively. Then table via is filled so thai it assigns to each ot its 2 16 possible 2-di- 
mensional input vectors (x 0 ,x,] the 8-bh index ol the codeword |y LVnB ,y H1 lB ] ot Q i.ia ne a f est to [UOJx, 4 Ui^.hfOJx, 
+ hfljx,] in the weighted squared error sense. Table l ib is filled so that ii assigns to each ot its 2 16 possible 2-dimen- 
sional input veclors [x 2 ,>^ the 8-bit index of the codeword [y L1 . lb , y Hn .ib] of Q i.it nearest to [l(2)x 2 + l(3)x 3 , h(2)x 2 + h 
(3)x 3 ] in the weighted squared error sense. And finally Table V2 is filled so that it assigns to each ol its 2 16 possible 
4-dimensional input veclors (i.e. the cross product of all possible 2-dimensional output vectors from the first stage), for 
example. Iy L1 . la , y M1 . 1B , y L1 . 1b . y H1 , lb |. the Bbit index of the codeword fy LV y H1 ] of Q 1>2 nearest to [y u 1a 4 y L1 lb , y H1 1a 
4 y H1 1b ] in the weighted squared error sense. 

For a small cost in performance, it is possible to design the tables so that Table via and Table 1.1b are the same, 
tor instance. Table V 1 , if i = 1 in FIGURE 4b. In this case ; Table VI is simply e table lookup version ol a 2-dimensional 
VO that best represents pairs ol inputs lx 0 , x,] in the ordinary (unweighted) squared error sense. Then Table l .2 is lilled 
so that ii assigns to each ol its 2 16 possible ^-dimensional inpul vectors {i.e., the cross product of all possible 2-dimen- 
sional output vectors from the first stage), tor example, |y 0: y v y 2 . y 3 ]. ihe 9-bit index of the codeword [y L1 . y Hl ] of 2 
nearest to |!(0)y o + l(1}y t 4 l(2)y 2 + lO^.htOJypshpJy! + h(2)y 2 + h(3)y 3 ]inihe weighted squared error sense. Making 
Tables via and VI b the same would resuli in a savings ot both table memory and computation, as shown in FIGURE 
4b. The corresponding signal flow diagram is shown in FIGURE 4c. 

Relerring again to FIGURE 4a, the tables in the second stage (i - 2) are just slighity more complicated. Decompose 
the input signal X L1 , of length N/2, into low-pass and high-pass signals X^ and X H2 , each of lenglh N/4. This produces 

a sequence of 4-dimensk>nal vectors. x(i) = [x L2 {i),x H2 (i).x H1 (2i).x H1 (2i * lor i = 0.1 N/4 - 1. where x^i) = l(0)x L1 

(20 + UDx LI (2i 4 1) * l(2)x L1 (2i4 2) + 1(3)^,(21 + 3) and x M2 (i) * h(0)x L) (2i) 4 hfOXutfi 4 1) + h(2)x L1 (2i 4 2) 4 h(3)x L) 
<2i 4 3). Such 4-dimensionel veclors are used lo train an 8-bil vector quanli2er Q 2 2 lor Table 2.2. Likewise, 4-dimensional 
vectors |lfO)x L1 (2i) 4 l(l)x L1 {2i 4 l ),h(0)x L1 (2i) 4 h(l)x L ,(2i 4 l).x H1 (2i),x H i(2i 4 l)], are used to train an 8-bil vector 
quantizer 0 2 1n lor Table 2.1a. and 2-dimensional veclors |l[2)x Ll (2i + 2) + l(3)x L1 (2i 4 3), h(2)x L1 {2i 4 2) + h(3)x Ll (2i + 
3). x H1 (2i 2). x H1 (2i «■ 3)]. are used to tram an E-bn vector quantizer C 2 , t for Table 2.1b. All three quantizers are trained 
to minimize the expecied weighted squared enor disloriion measure dtx.y) = (w^Xo-y^) 2 4 [w^x,- y,)] 2 4 |w M1 (x 2 - 
y?)F + i w Hi( x 3 " v 3)) 2 = where 'he consianis w^, w H2 and w H1 are proponional to Ihe human perceptual sensitivities in 
iheir respeciive bands. Then Table 2. la is filled so thai ii assigns to each ol its 2 16 possible 4-dimensional input veclors 
|y 0; y 1( y 2: y 3 ] the 6-bil index o) the codeword ol 0 2 la nearest to[l(0)y 0 4 l(l}y lt h(0)y 0 4 h(l)y 1l y 2l y 3 ] in the weighted 
squared error sense, and soon tor Tables 2.1 b and 2.2 in ihe second stage, and the tables in any succeeding stages. 

As in HVQ, in WWHVO the vecior dimension doubles with each succeeding stage. The formats of the veclors at 
each stage are shown as outputs of the encoder 30, graphically represented in FIGURE 4c. These formats are lor the 
case of the iwodimensional separable DWT shown in FIGURE 2b. 

Relerring now to the case where all tables in a given substage are idenitcal as in FIGURE 4b. and 4c. ii the number 
ol inputs to a stage is S bytes, then ihe number ol culputs is: 

S/2 

and the number of table lookups per output is tog L. II the compression ratio is C. then the lotal number of outputs 
(including the outputs of intermediate stages) is 1 



N 2 [C-1)C 



EP0 701 375 A2 



lor an image ol size N 2 . Thus, the total number ot table lookups is: 

(N ; [C-l)!ogLyC 

H the amount cl storage needed tor the HVO encoder is T. then the amcuni ot storage needed lor the WWH VQ encoder 
s is T log L per stage. 

Also shown in FIGURE 4c are the respective delays Z which successively increase with each stage. The oval symbol 
including the i 2 obsignation indicates that only one ol every iwo outputs is selected, as those skilled in the art will 
appreciate. 

The WWHVO decoder 50 which pertorms steps 22-26 ot FIGURE 3 is shown tn FIGURE 5. All tables o( decoder 
jo SO in a given substage are identical, similar tc the encoder of FIGURES -fib and 4c. As tnose skilled in the art will appre- 
ciate, the oddeven split tables 52. 54 ai each stage handle the inierpolaibn by 2 thai is part of the DWT reconstruction. 
It L = 4 and the fitter coefficients are h(i) and l(i).i = 0. 1.2.3, then the odd table 52 computes h(1)x(i) + h(3)x(i 4 l) and 
l(l)x(i) ■> (i ^ 1), while the even table 54 computes hiO)x(i) ♦ h(2)x(i ♦ i) and l(0)x(i) i(2)x(i + 1), where x(i)'s are the 
inputs to that stage. If the number of inputs to a stage is £. then the number of outputs from thai stage is 2£. The total 
'£ number ol table lookups for the stage is: 

S log (b-2) 

If the compression ratio is C, then the total number ol outputs (including outputs of intermediate stages) is: 

2N 2 (C-1)/C 

so lor an image of size N 2 . Thus, the total number ot table lookups is: 

(N 2 (C-i)/C} log(L72) 

If the amount ol storage needed for the HVQ encoder is T, then the amount ol storage needed (or the WWHVO decoder 
per stage is: 

*s T(tog(L/2)+1)=TlogL 

All the storage requirements presented are for 8-biis per pixel input. For color images (YUV. 4:2:2 format) the storage 
requirements double. Similarly the number of table lookups also doubles for color images. 

As shown in FIGURES 6 and 7. there are 1wo options for handling motion and interframe coding using the presenl 
method. In a lirst mode (FIGURE 6). the subband coding is extenoeo and followed by a vector quantization scheme that 
30 allows lor intraframe coding periormed as described in conneclion wilh encoder 30 to inlerlrame coding, designated by 
reference numeral 80. This is similar to 3-0 subband coding. 

The second way (FIGURE 7) ot handling motion is to use a simple trame differencing scheme that operates on the 
compressed data and uses an ordered codebook to decide whether to transmit a cenain address Specifically, the 
WWHVO encoder 30 shown in FIGURE 4c is used in conjunction wilh a frame dtflerencer 70. The Irame dttferencer 70 
35 uses a curreni trame encoded by encoder 30 and a previous Irame 72 as inputs to obtain a result. 
Some of the feaiures ol the WWHVO of the present invention are: 

Transcoding 

*o The sender iransmits a video slream compressed ai 16:1, The receiver requests the "gateway* for a 32:1 slream 

(or the gateway sees thai the slower link can only handle 32: i). All ihe gateway has lo do to achieve Ihis transcoding 
is doalurther level of table lookup using the dala il receives (al i6.ljas inpul. This is extremely useful, especially when 
a large range ol bandwidths have to be supponed. as. for example, in a heterogenous networked environment. 

-5 Dimension Scaling 

II the video stream is compressed up to J slages. then the receiver has a cnoice of [J] + i image sizes without any 
extra etlort. For example, il a 256 x 256 image is compressed lo 5 stages (i.e.. 64:i), then the receiver can reconstruct 
a 32 x 32 or 64 x 64 or 126 x 12E or 256 x 256 image without any overhead. This is done by just using Ihe low pass 
so bands LL available at the even numbered slages (see FIGURE 3). Also, since the whole method is based upon table 
look-ups, il is very easy to scale the image up. i.e.. the interpolation and lowpass filtering can be done ahead of time. 
Note also thai all this is done on the compressed bil-siream itsell. In siunoard viaeo compression schemes both down 
anc up scaling is achieved by explicilly lillermg and decimating or interpolating the input or oulpul image. 

55 Motion 

There are two simple options for handling motion Simple Irame diflcrcncing can bo applied to the compressed data 
itself. Another oplion is to use a 3-D subband coding scheme in wnich the temporal dimension is handled using another 
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WWHVQ in conjunction witn the scatial WWHVQ. The wavelet used here can be different (n is ditlerent in practice). In 
£ preferred implementation, motion detection and thresholding are accomplished. This is done on the compressed date 
stream. The current compressed frame and the previous compressed Irame are comcared using a table lookup. This 
generates a binary- map which indicates places wnere motion has occurred. This map is used to decide which blocks 
(cooeworcs) tc send and a zero is inserted in place ol stationery biocks. This whole process is done in an online manner 
with a run-length encoder. Tne table that is used tocreaie the binary map is constructed ofi-iine and the motion threshold 
is set at that time. 

Dithering 

Typically, the decoder has to do color space conversion and possibly dithering (tor< 24-bit displays) on the decoded 
stream belore it can display it. This is quite an expensive task compared io the complexity of the WWHVQ decoder. But 
since the WWHVQ decoder is also based upon table lookups, these steps are incorporated into the decoder output 
tables. Other than speeding up the decoder, the other major advantage of this technique is that it makes the decoder 
completely independent of the display {resolution, depth etc.). Thus, the same compressed stream can be decoded by 
the same decoder on two different displays by just using different output tables. 

The WWHVO method is very simple and inexpensive to implement in hardware. Basically, since the method pre- 
oominanlly uses only table lookups, only memory and address generation logic is required. The architectures described 
below differ only in the smounl of memory they use versus the amount of logic. Since alternate stages operate on row 
ano column data, there is a need lor some amount ol buffering between stages. This is handled explicitly in the archi- 
lectures described below. All the storage requirements are given lor 8-bits per pixel input. For color images (YUV. 4:2:2 
lormat) the storage requirements double. But the timing inlormation remains the same, since the luminance and chromi- 
nance paths can be handled in parallel in hardware. Also, as noted above, two simple options are available lor inter! rame 
coding in WWHVO. The 3-D subband coding option is implemented as an additional WWHVO module, while the Irame 
differencing option is implemented as a simple comparator. 

Referring now to FIGURE £, each one of the lables i.i (or i.la and i.tb) and i.2 of the present invention are mapped 
onto a memory chip (64KB in this case). The row-column alternation between stages is handled by using a bufler 60 of 
NL bytes between stages, where N is the row dimension of the image and L is the length ol the wavelet filter. For example, 
between stages 1 and 2 this buffer is written in row format (by stage 1) and read in column formal by stage 2. Some 
simple aodress generation logic is required. Accoraingly. address generator 90 is provided. The address generator 90 
is any suitable device such as an incremenier. an accumulator, or an adderplus some combinational glue logic. However, 
this architecture is almost purely memory. The input image is led as Ihe address to the tirst memory chip whose output 
is led as Ihe address lo the next memory chip and so on. 

The total memory requirements tor the encoder 30 and the decoder 50 are T log L + NL(M - i) bytes, where M is 
the number of stages. For example, the WWHVO encoder and decoder shown in FIGURES 3 and S need 64KB lor each 
table i.i, i,2 plus the bufler memory 60. If the number ol stages, M, is 6, the wavelet fitter si2e ; L, is 4 and the image row 
dimension. N, is 256. then the amouni of memory needed is 768KB h 5 KB = 773 KB. The number of 64KB chips needed 
is 12 and Ihe number of 1KB chips needed is 5. The throughput is obviously maximal, i.e., one output every block cycle. 
The laiency per stage is NL clocks, except lor the firsl stage which has a latency of 1 clock cycle. Thus the latency after 
the m stages is (m - 1 )NL + 1 clocks. 

The main advantage of this archilecluie is that il requires almosl no glue logic and is a simple flow through archi- 
tecture. The disadvantages are that the number of chips required scales with the number ol stages and board area 
required becomes quite large. The capacity ol each chip in this architecture is quite small ana one can easily buy 8 
cneap memory chip which has the capacity ol all Ihese chips combined. This is considered in the architecture of FIGURE 
c 

Referring now to FIGURE 9. il all the lables are loaded onto one memory chip 100. then an address generator 102 
and a name buffer 104 are used, in this architecture one stage ol the WWHVO is compuled at a time. In fact, each 
sub-level of a stage is computed one al a time. So in the encoder 30 the table lookups lor table i.i are done first, Ihe 

N 2 /2 

values that result are stored in the (half) frame bulfer 104. These values ate then used as input lor compuling the table 
lookups of table i.2 and so on. Clearly, the most Irame storage needed is: 

N 2 /2 

bytes. The Irame bufler 104 alsopermils simple row-column alternation between stages. 

The address generator 102 has to generate iwo addresses, one lor the table memory chip 106 and another lor the 
Irame memory chip 104. The address for Ihe Irame memory chip 104 is generated by a simple incremenier, since the 
access is uniform. It the number of laps in ihe wavelet filter is L. then the number of levels per stage (i.e. : wavelet lables) 



6 



EP 0 701 375 A2 



lor the encoder 30 is log L {h is log U2 lor the decooer). Each ol the tables i l end i.2 are of si2e tsize. The output of 
level o of stage mis computed, tltheouipui of the frame memory 104 is y[i}. then the address mat the address generator 
105 computes for the table memory 106 is y(i) + offset, where offset = |(m - i)fog L + (q-1)]*tsize. This is assuming the 
tables i.i, i.2 are stored in an ordered manner. The multiplies! tons involved in the computation ol oflsei need not be 
done since oflsef can be maintained as a running total. Each time one level o; computation is completed ctiset = offset 
4 isize. Thus, the address generator 102 is any suitable device such as an incremenier, an accumulator, or an adder 
plus some combinatorial glue logic. 

The total memory requirements for ihis architecture is: 

T loo L + N 2 /2 

bytes spread out over two chips, the frame memory 104 and the table memory 106. For the example considered in the 
previous section this translates to 600KB ol memory (756KB 32KB). The throughput is one output every clock cycle. 
The latency per stage is: 

(NV^JbgL. 

where m is the stage number. The first stage has a latency of just log L. Thus, the latency atier m stage is: 

N 2 (1-(1/2 m " 1 ))logL + 1 

The advantages of Ihis architecture are its scalability, simplicity and low chip count. By using a large memory chip 
100 (approximately 2MB or more), various configurations of the encoder and the decoder, i.e., various table sizes and 
precision, may be considered and. the number ot stages can be scaled up or down without having to change anything 
on the board. The only disadvantage is latency, but in practice the latency is well below 50 milliseconds, which is the 
threshold above which humans start noticing delays. 

II is imponani to note The connection between the requirement of a haff frame memory 104 and the lalency. The 
latency is there primarily due lo the lact that all ihe outputs of a stage are computed before the next stage computation 
begins. These inlermediate outputs are stored in Ihe frame memory 104. The reason the lalency was low in the previous 
architecture was that the computation proceeded in a flow through manner, i.e.. begin computing the outputs ol stage 
m before all ihe outputs o1 stage m - 1 were compuied. 

FIGURE 10 illustrates the architecture lor Ihe aecoder similar lo the encoder ol FIGURE 6. As shown, an address 
generator 90 is connected to a buffer 60' having inputs ol the odd and even tables 52 and 54. The output of the buffer 
connects to the next stage and the address generator also connects to the next buffer. 

FIGURE 11 shows the architecture tor the decoder similar lothe encoder ol FIGURE 9. As illustrated, the interframe 
coder 60' is simply placed at the input to (he chip 100'. as opposed to the output. 

The memory chips utilized to lacilitaie the look-up tables of the present invention are preferably read only memories 
(ROMs). However, it is recognized that other suitable memory means such as PROM, EPROM, EEPROM, RAM, etc. 
may be utilized. 

Claims 

1. A method lor compressing and transmitting data, Ihe method comprising steps of: 

receiving the data; 

successively performing multiple stages ol lirsi lookup operations to obtain compressed daia at each stage 
representing vector quantized discrete subband coefficients; and. 
iransmining the compressed data to a receiver. 

2. The method according to claim i further comprising; 

receiving the compressed data al Ihe receiver; 

successively performing multiple stages of second lookup operations ic selectively obtain at each stage before 
a lasi stage decompressed data representing a partial inverse subband transform ol the compressed data. 

3. The method of claim 1 or 2, wherein i lookup stages die perlormec and the compressed data is 2*:1 compressed 
dala. where i is an integer. 

4. The method according to claim 1. 2 or 3. wherein the subband transiorrn coefficients comprise discrete wavelet 
transform coefficients. 

5. The method of any of claims 1 to 4, turfher comprising iranscoding ;hc compressed daia after ihe iransmining al a 
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gateway tc the receiver lo obiain iurther compressed daia. 

6. A method adaptable lor use on signal aala compressed by performing a subtend transform followed by a vector 
Quantization of the signal data, comprising steps of: 

receiving the compressed signal cats; 

successively performing multiple stages oJ lookup operations to selectively octain at each stage belore a last 
stage panially decompressed data representing a partial subbanc transform of the compressed signal data. 

7. The apparatus lor compressing and transmitting data, comprising: 

means for receiving the data; 

means for successively performing multiple stages o1 first lookup operations to obtain compressed data at 
each stage representing vector quantized discrete subband coefficients: and. means transmitting the compressed 
cata lo a receiver. 

6. The apparatus according to claim 7, Iurther comprising; 

means for receiving the compressed data at the receiver; and 

means for successively performing multiple stages ot second lookup operations to selectively obtain at each 
stage belore a last stage oecompressed data representing a partial inverse subbanc transform of the compressed 
data. 

9. An apparatus adaptable lor use on signal data compressed by performing a subband transform followed by a vector 
quantization of the signal data, comprising: 

means for receiving the compressed signal data: 

means for successively performing multiple siages of lookup operations to selectively obtain at each stage 
before a last stage partially decompressed data representing a partial subband transform of the compressed signal 
data. 

10. A programmable apparatus for compressing and receiving data, when suitably pprogrammed for carrying out the 
method ol any of claims 1 to 6. 
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